Preprocessing: A Prerequisite for Discovering Patterns in Web Usage Mining Process
نویسندگان
چکیده
Web log data is usually diverse and voluminous. This data must be assembled into a consistent, integrated and comprehensive view, in order to be used for pattern discovery. Without properly cleaning, transforming and structuring the data prior to the analysis, one cannot expect to find meaningful patterns. As in most data mining applications, data preprocessing involves removing and filtering redundant and irrelevant data, removing noise, transforming and resolving any inconsistencies. In this paper, a complete preprocessing methodology having merging, data cleaning, user/session identification and data formatting and summarization activities to improve the quality of data by reducing the quantity of data has been proposed. To validate the efficiency of the proposed preprocessing methodology, several experiments are conducted and the results show that the proposed methodology reduces the size of Web access log files down to 73-82% of the initial size and offers richer logs that are structured for further stages of Web Usage Mining (WUM). So preprocessing of raw data in this WUM process is the central theme of this paper.
منابع مشابه
Preprocessing: A Prerequisite for Discovering Patterns in WUM Process
Web log data is usually diverse and voluminous. This data must be assembled into a consistent, integrated and comprehensive view, in order to be used for pattern discovery. Without properly cleaning, transforming and structuring the data prior to the analysis, one cannot expect to find meaningful patterns. As in most data mining applications, data preprocessing involves removing and filtering r...
متن کاملA Survey on Preprocessing Methods for Web Usage Data
World Wide Web is a huge repository of web pages and links. It provides abundance of information for the Internet users. The growth of web is tremendous as approximately one million pages are added daily. Users’ accesses are recorded in web logs. Because of the tremendous usage of web, the web log files are growing at a faster rate and the size is becoming huge. Web data mining is the applicati...
متن کاملA Neoteric Web Recommender System based on Approach of Mining Frequent Sequential Pattern from Customized Web Log Preprocessing
A real world challenging task of the web master of an organization is to match the needs of user and keep their attention in their web site. So, only option is to capture the intuition of the user and provide them with the recommendation list. Web usage mining is a kind of data mining method that provide intelligent personalized online services such as web recommendations, it is usually necessa...
متن کاملAn Efficient Preprocessing Methodology for Discovering Patterns and Clustering of Web Users using a Dynamic ART1 Neural Network
Abstract : In this paper, a complete preprocessing methodology for discovering patterns in web usage mining process to improve the quality of data by reducing the quantity of data has been proposed. A dynamic ART1 neural network clustering algorithm to group users according to their Web access patterns with its neat architecture is also proposed. Several experiments are conducted and the result...
متن کاملFuzzy Equivalent Matrix for Discovering Patterns of Web Users Navigation
-World Wide Web provides abundance of information for the Internet users and is a huge repository of web pages and links. The growth of web is tremendous as approximately one million pages are added daily. Web logs record users’ accesses. Because of the tremendous usage of web , the web log files are growing at a faster rate and the size is becoming huge. Web data mining is the application of d...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1105.0350 شماره
صفحات -
تاریخ انتشار 2011